{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature engineering (preparación de variables)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. [Definicion](#1)\n", "2. [Imputación](#2)\n", "3. [Valores atípicos](#3)\n", "4. [Binning](#4)\n", "5. [Transformación logarítmica](#5)\n", "6. [One-hot encoding](#6)\n", "7. [Separación de valores](#7)\n", "8. [Ajuste de escala](#8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Definición\n", "\n", "\n", "\n", "__[What Is Feature Engineering](https://medium.com/mindorks/what-is-feature-engineering-for-machine-learning-d8ba3158d97a)__\n", "\n", "Proceso de aplicación del conocimiento de los datos de cierto ámbito/dominio para seleccionar o crear variables que mejoren el desempeño de los modelos predictivos. Se recomienda realizar luego del Análisis Exploratorio de Datos.\n", "\n", "## Técnicas\n", "\n", "- Imputación, manejo de valors faltantes (eliminar o encontrar un valor adecuado)\n", "- Manejo de valores atípicos, eliminarlos o preservarlos.\n", "- Binning, agrupar valores en clases típicamente para convertir variables contínuas en discretas.\n", "- Transformación logaritmica, para lidiar con distribuciones muy asimétricas\n", "- One-hot enconding, convertir variables nominales en 0s y 1s\n", "- Separación de valor (Feature Split), ej convertir nombre completo en nombre y apellido.\n", "- __[Ajuste de escala](https://en.wikipedia.org/wiki/Feature_scaling)__., para ubicar variables en rangos recomendados\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./01-eda-visual-techniques.png)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50 1 \n", "1 0.351 31 0 \n", "2 0.672 32 1 \n", "3 0.167 21 0 \n", "4 2.288 33 1 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "df = pd.read_csv(os.path.join(\"./csv/diabetes.csv\"))\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imputación\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
count768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000
mean3.845052120.89453169.10546920.53645879.79947931.9925780.47187633.2408850.348958
std3.36957831.97261819.35580715.952218115.2440027.8841600.33132911.7602320.476951
min0.0000000.0000000.0000000.0000000.0000000.0000000.07800021.0000000.000000
25%1.00000099.00000062.0000000.0000000.00000027.3000000.24375024.0000000.000000
50%3.000000117.00000072.00000023.00000030.50000032.0000000.37250029.0000000.000000
75%6.000000140.25000080.00000032.000000127.25000036.6000000.62625041.0000001.000000
max17.000000199.000000122.00000099.000000846.00000067.1000002.42000081.0000001.000000
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin \\\n", "count 768.000000 768.000000 768.000000 768.000000 768.000000 \n", "mean 3.845052 120.894531 69.105469 20.536458 79.799479 \n", "std 3.369578 31.972618 19.355807 15.952218 115.244002 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 1.000000 99.000000 62.000000 0.000000 0.000000 \n", "50% 3.000000 117.000000 72.000000 23.000000 30.500000 \n", "75% 6.000000 140.250000 80.000000 32.000000 127.250000 \n", "max 17.000000 199.000000 122.000000 99.000000 846.000000 \n", "\n", " BMI DiabetesPedigreeFunction Age Outcome \n", "count 768.000000 768.000000 768.000000 768.000000 \n", "mean 31.992578 0.471876 33.240885 0.348958 \n", "std 7.884160 0.331329 11.760232 0.476951 \n", "min 0.000000 0.078000 21.000000 0.000000 \n", "25% 27.300000 0.243750 24.000000 0.000000 \n", "50% 32.000000 0.372500 29.000000 0.000000 \n", "75% 36.600000 0.626250 41.000000 1.000000 \n", "max 67.100000 2.420000 81.000000 1.000000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df.isnull()\n", "df.describe(include='all')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.167NaN0
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 NaN 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[3,'Age'] = np.nan\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
count768.000000768.000000768.000000768.000000768.000000768.000000768.000000767.000000768.000000
mean3.845052120.89453169.10546920.53645879.79947931.9925780.47187633.2568450.348958
std3.36957831.97261819.35580715.952218115.2440027.8841600.33132911.7595800.476951
min0.0000000.0000000.0000000.0000000.0000000.0000000.07800021.0000000.000000
25%1.00000099.00000062.0000000.0000000.00000027.3000000.24375024.0000000.000000
50%3.000000117.00000072.00000023.00000030.50000032.0000000.37250029.0000000.000000
75%6.000000140.25000080.00000032.000000127.25000036.6000000.62625041.0000001.000000
max17.000000199.000000122.00000099.000000846.00000067.1000002.42000081.0000001.000000
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin \\\n", "count 768.000000 768.000000 768.000000 768.000000 768.000000 \n", "mean 3.845052 120.894531 69.105469 20.536458 79.799479 \n", "std 3.369578 31.972618 19.355807 15.952218 115.244002 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 1.000000 99.000000 62.000000 0.000000 0.000000 \n", "50% 3.000000 117.000000 72.000000 23.000000 30.500000 \n", "75% 6.000000 140.250000 80.000000 32.000000 127.250000 \n", "max 17.000000 199.000000 122.000000 99.000000 846.000000 \n", "\n", " BMI DiabetesPedigreeFunction Age Outcome \n", "count 768.000000 768.000000 767.000000 768.000000 \n", "mean 31.992578 0.471876 33.256845 0.348958 \n", "std 7.884160 0.331329 11.759580 0.476951 \n", "min 0.000000 0.078000 21.000000 0.000000 \n", "25% 27.300000 0.243750 24.000000 0.000000 \n", "50% 32.000000 0.372500 29.000000 0.000000 \n", "75% 36.600000 0.626250 41.000000 1.000000 \n", "max 67.100000 2.420000 81.000000 1.000000 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe(include='all')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "#df['Age'].isnull()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
318966239428.10.167NaN0
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "3 1 89 66 23 94 28.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "3 0.167 NaN 0 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['Age'].isnull()]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(768, 9)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(768, 9)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Eliminación de valores faltantes\n", "df.dropna(how='all').shape" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(767, 9)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dropna(subset=['Insulin', 'Age'], how='any').shape" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.167NaN0
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 NaN 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
40137403516843.12.28833.01
55116740025.60.20130.00
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "4 0 137 40 35 168 43.1 \n", "5 5 116 74 0 0 25.6 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "4 2.288 33.0 1 \n", "5 0.201 30.0 0 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dropna(subset=['Insulin', 'Age'], how='any', inplace=True )\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50 1 \n", "1 0.351 31 0 \n", "2 0.672 32 1 \n", "3 0.167 21 0 \n", "4 2.288 33 1 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Asignación de valores\n", "df = pd.read_csv(os.path.join(\"diabetes.csv\"))\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "df.loc[3,'Age'] = np.nan" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.167NaN0
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 NaN 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.16733.00
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 33.0 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df['Age'].fillna(0, inplace=True) #Casi nunca es buena idea!\n", "df['Age'].fillna(round(df['Age'].mean()), inplace=True) #Pocas veces es buena idea!\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(\"diabetes.csv\"))\n", "df.loc[3,'Age'] = np.nan" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.167NaN0
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 NaN 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()\n", "#df.shape" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "#df.loc[df['Age'].notnull(),].head()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos = df.groupby('Pregnancies')\n", "por_embarazos" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: Int64Index([ 4, 16, 45, 57, 58, 59, 66, 78, 83, 102,\n", " ...\n", " 649, 677, 681, 682, 697, 713, 727, 736, 753, 757],\n", " dtype='int64', length=111),\n", " 1: Int64Index([ 1, 3, 13, 18, 19, 27, 46, 50, 51, 55,\n", " ...\n", " 726, 739, 742, 746, 747, 751, 755, 758, 766, 767],\n", " dtype='int64', length=135),\n", " 2: Int64Index([ 8, 38, 47, 60, 63, 67, 70, 79, 81, 85,\n", " ...\n", " 707, 709, 728, 729, 732, 733, 734, 738, 760, 764],\n", " dtype='int64', length=103),\n", " 3: Int64Index([ 6, 20, 31, 32, 40, 80, 108, 110, 126, 132, 140, 166, 169,\n", " 190, 197, 227, 234, 242, 256, 260, 261, 263, 272, 304, 313, 316,\n", " 317, 318, 321, 347, 348, 352, 354, 368, 370, 389, 396, 398, 399,\n", " 415, 419, 431, 480, 494, 501, 504, 514, 515, 521, 524, 525, 527,\n", " 539, 541, 551, 570, 572, 588, 592, 610, 611, 615, 644, 659, 673,\n", " 678, 686, 696, 710, 714, 716, 730, 741, 748, 752],\n", " dtype='int64'),\n", " 4: Int64Index([ 10, 35, 39, 69, 73, 91, 93, 107, 113, 115, 118, 119, 130,\n", " 144, 151, 160, 167, 168, 184, 198, 199, 228, 230, 233, 235, 241,\n", " 262, 264, 288, 320, 350, 351, 363, 364, 378, 393, 394, 400, 406,\n", " 417, 425, 442, 444, 474, 479, 482, 488, 492, 493, 535, 543, 547,\n", " 549, 568, 604, 625, 629, 641, 643, 666, 683, 698, 699, 704, 720,\n", " 725, 735, 750],\n", " dtype='int64'),\n", " 5: Int64Index([ 5, 14, 29, 30, 52, 62, 65, 71, 77, 84, 116, 117, 123,\n", " 139, 141, 148, 178, 179, 183, 189, 195, 205, 207, 216, 218, 219,\n", " 265, 278, 286, 289, 302, 303, 337, 343, 349, 360, 361, 362, 365,\n", " 386, 388, 391, 402, 404, 437, 457, 463, 496, 546, 628, 636, 652,\n", " 684, 711, 719, 723, 765],\n", " dtype='int64'),\n", " 6: Int64Index([ 0, 33, 95, 98, 121, 165, 170, 171, 176, 180, 204, 217, 231,\n", " 243, 295, 310, 319, 329, 366, 401, 410, 439, 469, 495, 499, 502,\n", " 519, 522, 533, 552, 560, 563, 567, 576, 581, 587, 594, 601, 613,\n", " 616, 622, 642, 664, 668, 670, 675, 701, 705, 749, 759],\n", " dtype='int64'),\n", " 7: Int64Index([ 15, 17, 22, 26, 41, 42, 44, 48, 49, 54, 56, 64, 76,\n", " 82, 92, 114, 155, 161, 185, 192, 209, 212, 222, 223, 236, 276,\n", " 282, 283, 285, 314, 339, 473, 477, 498, 503, 517, 555, 603, 612,\n", " 630, 638, 693, 695, 715, 756],\n", " dtype='int64'),\n", " 8: Int64Index([ 2, 9, 21, 53, 61, 111, 133, 154, 175, 186, 188, 194, 206,\n", " 299, 330, 344, 345, 387, 408, 424, 443, 462, 468, 478, 489, 509,\n", " 540, 545, 557, 583, 584, 586, 662, 674, 690, 731, 737, 754],\n", " dtype='int64'),\n", " 9: Int64Index([ 23, 37, 43, 131, 146, 152, 191, 214, 238, 245, 248, 250, 338,\n", " 355, 403, 459, 460, 512, 516, 523, 618, 663, 669, 676, 708, 743,\n", " 761, 762],\n", " dtype='int64'),\n", " 10: Int64Index([ 7, 11, 12, 25, 34, 143, 246, 270, 281, 306, 327, 458, 464,\n", " 505, 542, 578, 634, 660, 667, 672, 706, 712, 717, 763],\n", " dtype='int64'),\n", " 11: Int64Index([24, 36, 193, 259, 558, 559, 590, 614, 648, 658, 740], dtype='int64'),\n", " 12: Int64Index([215, 254, 333, 358, 375, 436, 510, 582, 745], dtype='int64'),\n", " 13: Int64Index([28, 72, 86, 274, 323, 357, 518, 635, 691, 744], dtype='int64'),\n", " 14: Int64Index([298, 455], dtype='int64'),\n", " 15: Int64Index([88], dtype='int64'),\n", " 17: Int64Index([159], dtype='int64')}" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos.groups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Se recomienda emplear la métrica de tendencia central que sea menos afectada por valores atípicos:\n", "\n", "**La Mediana.**" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Age
Pregnancies
025.0
124.0
225.0
327.0
430.0
536.0
636.5
741.0
843.0
944.0
1040.5
1145.0
1246.0
1343.5
1442.0
1543.0
1747.0
\n", "
" ], "text/plain": [ " Age\n", "Pregnancies \n", "0 25.0\n", "1 24.0\n", "2 25.0\n", "3 27.0\n", "4 30.0\n", "5 36.0\n", "6 36.5\n", "7 41.0\n", "8 43.0\n", "9 44.0\n", "10 40.5\n", "11 45.0\n", "12 46.0\n", "13 43.5\n", "14 42.0\n", "15 43.0\n", "17 47.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#por_embarazos.agg({'Age': ['mean','median']})\n", "por_embarazos.agg({'Age': 'median'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### El gráfico de caja muestra la media o mediana?" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXsAAAEcCAYAAAAmzxTpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3dfXxcZZn/8c9lU1JoSwsotaVAUUDT1JVSFrVWJURBEBUfQAO4ZRup5bdWV9htusQV2SVC2QLLsrsgGKQCTUFUVJ6xmeh2AZXyWIgoYimlPGpLaWFLU67fH+dMmKaTZDLnnM6ZnO/79ZpXZs6cuc49T1fuuc597mPujoiIDG9vqXQDREQkeUr2IiIZoGQvIpIBSvYiIhmgZC8ikgFK9iIiGaBkLzsws6vN7NxKt6PSBnodzOxUM1uxs9tUjczsZDO7s9LtyDol+xQzs9Vm9pqZbTKz9WZ2i5ntW+l2FTIzN7MDK92OamZmXWb2f+H7/JKZ/djMJla6XXFx9+vc/ahKtyPrlOzT75PuPgaYCDwPXFrh9iTGAln9TH41fJ8PBsYDFxdbycxG7NRWybCR1S9W1XH3/wNuBKbml5nZODP7gZm9aGZPmdk388nSzC4zsxsL1l1kZsvDhHqEma01s7PCnuRqMzu5v22b2Wlm9oSZ/cXMfmZmk8LlvwpXeSjslX6hyGNHmNmF4Xb+ZGZfDX8N1IT3d5lZm5n9L/Aq8A4zmxRu5y/hdk8riLddaSX/XApurzazfzKzx8JfQ983s1EF9x9nZg+a2QYzu9vM/qrgvulmdr+ZvWJm1wO9j+v/pbFLzexlM/udmTWGC08ws5V9VjzTzG4aJB7u/hfgR8C0gud7mZndamabgQYzqzWzxWa2xsyeN7PLzWzXgm0tMLNnzWydmX258NdXGO+/wl+Jr5jZr83snQWPvcTMnjazjWa20sw+VHDft83shvAz94qZPWpmhxXcv2/4q+RFM/uzmf1nuHy7kpeZvdvM7grf38fN7MSC+44N37tXzOwZM/uHwV4zKZG765LSC7Aa+Gh4fTdgCfCDgvt/APwUGAtMAX4PNBes/3vgVOBDwEvA5PC+I4Ae4CKgFvgIsBl4V3j/1cC54fUjw8ceGq57KfCrgjY4cOAAz2Ee8BgwGdgD+EX4mJrw/i5gDVAP1AAjgV8C/02QbA8BXgQa+7at4Lms7fOarQL2BfYE/rfguRwKvAC8DxgBzA7XrwV2AZ4CvhG24fPA1sJt9Xlep4avYX79LwAvh9usBf4C1BWs/wDwuX5idQFfDq+/FegEril4vi8DHyTonI0C/h34WbitscDPgfPC9T8OPBe+nrsB1xS+R2G8vwCHh6/3dcCygracAuwV3ndmGGtUeN+3gf8Djg1fv/OAe8P7RgAPEfwiGR22c1bBa7UivD4aeBr423AbhxJ8vurD+58FPhRe3wM4tNLfw+FyqXgDdBngzQkS0SZgQ5hY1gHvCe8bAWwBphas/xWgq+D24eEX+ymgqWD5EWG80QXLbgD+Obx+NW8myHbggoL1xhAkwSnh7cGSfSfwlYLbH2XHZP8vBffvC2wDxhYsOw+4um/bCp5L32Q/r+D2scAfw+uXAf/ap32PE/yz+3D4+lrBfXczcLLvu/5vgC8VbKstvF4PrAdq+4nVRfCrZgPwDEECflvB8y38B28E/5jfWbDsA8CfwutXESb+8PaB7Jjsv9fn9fndAO/feuC94fVvA78ouG8q8FpBG17Mv69FXqt8sv8C8D997v8ucHZ4fQ3B53j3Sn//httFZZz0O97dxxP0Fr8K/NLM3k7QA8z3RvOeAvbJ33D33wBPEiSIG/rEXe/um/s8dlKR7U8q3Ia7bwL+XLidQUwi6MnlPV1kncJlk4C/uPsrfdpW6vb6xit8XvsDZ4YlnA1mtoHgn8uk8PKMhxmn4LEDKbZ+fltLgJPMzIAvATe4+5YBYn3N3ce7+z7ufrK7v9jP83kbQY99ZcFzuD1cDqW93s8VXH+V4B840Ftu6g5LUxuAcQSftf4eOyosye0LPOXuPQM8Rwjeg/f1eQ9OBt4e3v85gn9AT5nZL83sA4PEkxIp2VcJd9/m7j8m6PXOIvjpu5Xgy5O3H0HPEAAz+zuCfxLrgAV9Qu5hZqP7PHZdkU2vK9xG+Ji9CrcziGcJSjh5xUYTFSbMdcCeZja2T9vy29tMkOzy3s6OCrdR+LyeJuhtjy+47ObuHWE79wmTc+FjB1Js/XUA7n4v8DpBCe0kgnJKuQpfn5eA1wjKHvnnMM6DnbtQ2utdVFifbwFOBPYIOxkvE3QWBvM0sF9+X8wg6/2yz3swxt1PB3D337r7p4G9gZvYsZMiZVKyrxIW+DRBHbPb3bcRfBHazGysme0PnAFcG65/MHAuQQ32S8ACMzukT9hzzGyX8Et+HPDDIpteCvytmR1iZrXAd4Bfu/vq8P7ngXcM0PQbgK+b2T5mNp4gmfTL3Z8mKJ+cZ2ajwh2ozQSlDYAHgWPNbM/wF87fFwnzd2Y22cz2BM4Crg+XXwnMM7P3ha/nr8IdgXsB9xCUtr5mZjVm9lmCMthA9g7XH2lmJwB1wK0F9/8A+E+gx91jGZPv7m+Ez+NiM9sbIHxtjw5XuYHg/aozs92Abw0h/FiC1+BFoMbMvgXsXuJjf0Pwj+Z8MxsdvncfLLLezcDBZval8HUbaWZ/HbZ3FwvG5I9z963ARoLOjcRAyT79fm5mmwg++G3AbHd/NLxvPkFP90lgBUFivirsXV0LLHL3h9z9DwRJ75owYUPwc3w9QU/0OoI69+/6btzdlwP/TDBC5FngncAXC1b5NrAk/El+Yt/HEySmO4GHCXZS3kqQUAb6EjcR7HBeB/yEoJ57V3jfNQQ7AleHca8v8vil4X1Phpdzw+dyH3AaQQJ+maDXXQsc6+6vA58lqC+vJ6gt/3iANgL8GjiIoLfdBnze3f9ccP81BKNqovTqi2kBngDuNbONBDu93wXg7rcB/wHkwnXuCR8zUAkp7w7gNoId+08R7IwtVgbaQdj5+CTBPoI1wFqC17Dveq8ARxF8htYRfA4XEbwPEHRMVofPax5BZ0ViYNuXHCULzOwI4Fp3nzzYugls+xjgcnfff9CVy4u/mmBkyy8GWe9bwNEECftgdz8uXL4XwU7MjxDsvL0DOMLdZ4X3v5tgRNIMgh7wP7t70VKDBcMhXyAYUfKHyE+uDGZWRzA6qbaEeroMY+rZS6LMbNdw7HSNme0DnE3QW6+0vyH4RXMdcLSZTQiX/xfBr6W3EwzNnJ1/QLi/4i6CXw57E/wC+W8zq+9nG6cDv93Zid7MPhOWRPYg6DX/XIlelOwlaQacQ1AaeQDoZmh15PgbZDaLYKfzDe6+EvgjwciZEQSjQc5291fd/TGCUTV5xwGr3f377t7j7vcTlLc+X2Qbq4GvE4xV39m+QvCr448E5bLTK9AGSZnB9pzLMOTuXWw/YiPJbb0K/PXO2Fa4vSklrDYbuNPdXwpvLw2XdRB8J/obutg7bLBgWQ1FavIltiMR7v7xSm1b0kvJXjIlrKOfCIwws/yY8VqC+WgmEOw8nkywkxK2H7qYHzb4sZ3UXJHYaAetZIqZNRHU5Q8hGAefdwPwW4JEvw34MsG4+TuBNe4+Kxz7vwr4JrAsfNwhwCZ37945z0CkPKrZS9bMBr7v7mvc/bn8hWA45skERymPIxgSeA1BaWcLlDRsUCS11LMXGYCZLQLe7u6zB11ZJMXUsxcpYMH0u38VHmF7OMHRu2kYKioSiXbQimxvLEHpZhLBAVEXEkwjLVLVVMYREckAlXFERDJAyV5EJAN2as3+rW99q0+ZMmXQ9TZv3szo0aMHXa9UccdLImba4yURM+3xkoiZ9nhJxEx7vCRiVireypUrX3L3txW9c2eeFmvGjBleilwuV9J6pYo7XhIx0x4viZhpj5dEzLTHSyJm2uMlEbNS8YD7XKclFBHJLiV7EZEMULIXEckAJXsRkQxQshcRyQAle5GdoKOjg2nTptHY2Mi0adPo6OiodJMkYzQ3jkjCOjo6aG1tpb29nW3btjFixAiam5sBaGpqqnDrJCtK6tmb2TfM7FEzW2VmHWY2yswOMLNfm9kfzOx6M9sl6caKVKO2tjba29tpaGigpqaGhoYG2tvbaWtrq3TTJEMGTfZmtg/wNeAwd58GjCA4ecMi4GJ3P4jgZNLNSTZUpFp1d3cza9as7ZbNmjWL7m6d3Ep2nlJr9jXArmZWA+wGPAscCdwY3r8EOD7+5olUv7q6OlasWLHdshUrVlBXV1ehFkkWDZrs3f0ZYDGwhiDJvwysBDa4e0+42lpgn6QaKVLNWltbaW5uJpfL0dPTQy6Xo7m5mdbW1ko3TTJk0PnszWwP4EfAF4ANwA/D22e7+4HhOvsCt7r7e4o8fi4wF2DChAkzli1b1neVHWzatIkxY8YM7ZnsxHhJxEx7vCRipj1enDGXL1/Otddey5o1a9hvv/045ZRTaGxsTE37koyZ9nhJxKxUvIaGhpXufljRO/ubNCd/AU4A2gtu/w1wGfASUBMu+wBwx2CxNBFa9cZLImba4yURM+3xkoiZ9nhJxKzWidDWAO83s93MzIBG4DEgB3w+XGc2OnWbiEhqlVKz/zXBjtj7gUfCx1wBtABnmNkTwF5Ae4LtFBGRCEo6qMrdzwbO7rP4SeDw2FskIiKx03QJIiIZoGQvIpIBSvYiIhmgZC8ikgFK9iIiGaBkLyKSAUr2IiIZoGQvIpIBSvYiIhmgZC8ikgFK9iIiGaBkLyKSAUr2IiIZoGQvIpIBSvYiIhmgZC8ikgFK9iIiGaBkLyKSAUr2Q9TR0cG0adNobGxk2rRpdHR0VLpJIiKDKukctBLo6OigtbWV9vZ2tm3bxogRI2hubgagqampwq0TEemfevZD0NbWRnt7Ow0NDdTU1NDQ0EB7ezttbW2VbpqIyICU7Iegu7ubWbNmbbds1qxZdHd3V6hFIiKlUbIfgrq6OlasWLHdshUrVlBXV1ehFomIlEbJfghaW1tpbm4ml8vR09NDLpejubmZ1tbWSjdNRGRAg+6gNbN3AdcXLHoH8C3gB+HyKcBq4ER3Xx9/E9MjvxN2/vz5dHd3U1dXR1tbm3bOikjqDdqzd/fH3f0Qdz8EmAG8CvwEWAgsd/eDgOXh7WGvqamJVatWsXz5clatWpW6RK+hoSJSzFCHXjYCf3T3p8zs08AR4fIlQBfQEl/TZKg0NFRE+jPUmv0XgXxXcYK7PwsQ/t07zobJ0GloqIj0x9y9tBXNdgHWAfXu/ryZbXD38QX3r3f3PYo8bi4wF2DChAkzli1bNui2Nm3axJgxY0p8CoOLO14SMeOI19jYyB133EFNTU1vvJ6eHo4++miWL1+eijZWU7wkYqY9XhIx0x4viZiVitfQ0LDS3Q8reqe7l3QBPg3cWXD7cWBieH0i8PhgMWbMmOGlyOVyJa1XqrjjJREzjnj19fXe2dm5XbzOzk6vr6+PHLswZlzSHi+JmGmPl0TMtMdLImal4gH3eT/5dyhlnCbeLOEA/AyYHV6fDfx0CLEkARoaKiL9KWkHrZntBnwM+ErB4vOBG8ysGVgDnBB/82QokhgaamZFl3uJ5T8RSYeSkr27vwrs1WfZnwlG50iKNDU10dTURFdXF0cccUTkeIVJfcrCW1h9/icixxSRnU9H0IqIZECqkr0OCIpOr2F0ZtZ7aWho6L0uUs1SM5+9DgiKTq9hPFS6kuEoNT17HRAUnV5DEelPapK95oqPTq+hiPQnNcm+WuaKT3NNvFpeQxHZ+VJTs88fEJSvN+cPCEpTCSLtNfFqeA1FpDJSk+yrYa74wpp4fhx7e3s78+fPT0U7q+E1FJHKSE2yrwbVUBOP+6CquA00hFFH5YokJzXJPu0lEnizJt7Q0NC7TDXxodGwRpHKSM0O2moYNqiJxkSkWqWmZ18tJRJQTVxEqk9qevbVMmww7eegFREpJjXJXiUSEZHkpKaMoxKJiEhyUtOzh2yWSOI+IjfNR/iKSOWkpmefRXEPN62G4asiUhmp6tlnTdzDTath+KqIVEaqkn01lDTijBn3cNNqGL4q8Sh2cpW0nWBFJcV0SU0ZpxpKGnHHjPuIXB3hmx35I5HTehSySoop5O477TJjxgzvT319vXd2drq7ey6Xc3f3zs5Or6+v7/cxA4k7XhIxly5d6gcccIB3dnb6XXfd5Z2dnX7AAQf40qVLUxGvr/1bbo4lTlLx8u9JnNLexrjb5x5PG5P4/uUl8T7HHbNS8YD7vJ/8m5qefTWUNOKOGfdw06amJu6++26OOeYYtmzZQm1tLaeddtqw70l1dHTQ1tbW+xq2trYO++ecdioppk9qkn01lDSSiBnnLJUdHR3ccsst3Hbbbdv9dJ45c+awTX4qF6STSorpk5odtHEfQZvEEblpP8o3i6Nxsvicq0HavytZVFLP3szGA98DpgEOzAEeB64HpgCrgRPdfX25DUmipBFnvKRixqm7u5u1a9cybdq03va1tLQM65/OWXzO1SDt35UsKrWMcwlwu7t/3sx2AXYDzgKWu/v5ZrYQWAi0RGlM3CfeSOJEHmk+OcikSZNYsGABS5cu7S1pnHTSSUyaNKnSTUtMFp9ztUjzdyWLBi3jmNnuwIeBdgB3f93dNwCfBpaEqy0Bjk+qkVK6vmOt0zb2OglZfM4iQ2U+yKngzOwQ4ArgMeC9wErg68Az7j6+YL317r5HkcfPBeYCTJgwYcayZcsGbdSmTZsYM2bMEJ7Gzo2XRMw44jU2NtLS0kJHRwdr1qxhv/32o6mpiUWLFrF8+fLIbTz19s1c/fHRkePEGa/annPcn5u42wfp/GwnGS+JmJWK19DQsNLdDyt6Z39jMvMX4DCgB3hfePsS4F+BDX3WWz9YrIHG2bsH48Tr6+v9LW95i9fX10ceHx53vEJpHJeb5Nhm93SOs6+255yVcfbVFC+JmNU6zn4tsNbdfx3evpGgPv+8mU1092fNbCLwQgmx+lUNR9CmXX4ERP4550dADOeRKVl8ziLlGDTZu/tzZva0mb3L3R8HGglKOo8Bs4Hzw78/jdKQwiF0+R067e3tzJ8/v6zkHHe8apDFERBZfM4i5Sh1nP184Dozexg4BPgOQZL/mJn9AfhYeLtshUPo8hMnrV27NlVH0FaDLJ4TIIvPuRpoIrR0KWnopbs/SFC776sxroZMmjSJlpYWrrvuut6yy8knn1z2EDodwSdSOVkso6Zdao6ghTdn8uvv9lDoCD6RytGRzemTmrlx1q1bx9VXX71d7fWCCy7g1FNPLSuearmSFv2N+4/SmUm7JMqomvAumtQk+7q6OiZPnsyqVat6d6jmcrnUTDImUq7CpJ7W+efjFncZVWWh6FJTxlHZRWT4iPv7rLJQdKnp2avsEg/91JU0iPv7nNXRdXFKTbIHlV2i0k9dSZM4v88aXRddaso4Ep1+6spwpTJvdKnq2Us0+qkrw5XKvNGpZ19hcR5lmP+pWygLP3V1pGY26EjpaNSzr6C4a+xZnBRM+ylESqOefQXFXWNvamqira2N+fPnc/TRRzN//vxh/1NX+ylESqNkP0RxlgySqLFn7adu3BPoiQxXKuMMQdwlAw0niy7uCfREhiv17Icg7pKBhpPFI84J9ESGK/Xsh6C7u5sf/vCHHHPMMWzZsoXa2lrmzJlTdslAw8mii3sCPZHhSsl+CMaPH88VV1zBBRdcwNSpU3nsscdYsGAB48ePH/zB/dBRw9EkMYGeyHCkZD8EGzduZNy4cUyfPp1t27Yxffp0xo0bx8aNGyvdtMyKc7jpe8+5k5df27rD8ikLb9nu9rhdR/LQ2UeV3WaRSlCyH4Kenh4WL168Xclg8eLFzJkzp9JNy6w4S2Evv7Z1h+mHi/3i6pv8RapBqnbQxn0kZNzxamtrWb58+XbLli9fTm1tbWramEVZG26aVfquRJOann3cwxqTOLLyIx/5CNdddx2nn346559/PrfeeiuXXXYZRx1V3k96Hf0pUhp9V6JLTc8+7mGNSRxZ+cwzz3D88cdz1VVX8clPfpKrrrqK448/nmeeeSY1bRQZjvRdiS41Pfu4jyZN4ujU7u5uHnjgAUaOHNlby926dSujRo1KTRtFhiN9V6JLTc8+7hkbk5gBshraKDIc6bsSXWqSfdxHkyZxdGo1tFFkONJ3JbqSyjhmthp4BdgG9Lj7YWa2J3A9MAVYDZzo7uvLbUjcR5M2NTVx9913b3e062mnnRZpZ04SbYwzXlaZWdHlmjahdGl/DfVdiW4oNfsGd3+p4PZCYLm7n29mC8PbLVEaE+fRpB0dHdxyyy3cdttt2+29nzlzZuSEH+cRrzqCNrp8Qpqy8JYdxsnLwPIHku3fcnPR+/PHFKThQDJ9V6KJsoP208AR4fUlQBcRk32cCvfe5z8c7e3tzJ8/X70BkZAOJMuOUpO9A3eamQPfdfcrgAnu/iyAuz9rZnsXe6CZzQXmAkyYMIGurq5BN7Zp06aS1htId3c327Zto6urqzfetm3b6O7ujhw7rjZWU7y8uGOmLV7fx/f3OkbZTtaeczV8ttPexljiufugF2BS+Hdv4CHgw8CGPuusHyzOjBkzvBS5XK6k9QZSX1/vnZ2d28Xr7Oz0+vr6yLELY8Yl7fHc3fdvuXlYxyv2+GKvY5TtZPE5V8NnO+1tLDUecJ/3k39LGo3j7uvCvy8APwEOB543s4kA4d8Xov3biZf23ouIvGnQMo6ZjQbe4u6vhNePAv4F+BkwGzg//PvTJBs6VNp7L5WmWTQlTUqp2U8AfhIOzaoBlrr77Wb2W+AGM2sG1gAnJNfM8mjvvVSSdn5Kmgya7N39SeC9RZb/GWhMolEiIhKv1MyNI8ObShoilTWsk31HRwdtbW29NfvW1lbV7CtEJY3oquEfZr6NTy06ruj9+YO39E995xu2yV7zX8twUw3/MHvbeP6b0yykrY1ZlZqJ0OKm+a9FRN6Uqp79/PnzufLKK7ebuOzSSy8tK1Z3dzdr165l2rRpvWWclpaW1M1/rVKTiBR6z5L3FL9jyY6LHpn9SMlxU5Ps58+fz+WXX86iRYuYOnUqjz32GC0twVQ75ST8SZMmsWDBApYuXdpbxjnppJOYNGlS3E0vm0pNItJXsQQex/Dx1JRxrrzyShYtWsQZZ5zBqFGjOOOMM1i0aBFXXnll2TH7Ttva3zSulaJSk4jsLKnp2W/ZsoV58+Ztt2zevHmceeaZZcVbt24dRx55JI2Njbg7ZkZjYyOdnZ1xNDcWSZxqLe3zksvwNrZuIe9ZsnDHO5b0XQ+gstNRZ+27kppkX1tby+WXX84ZZ5zRu+zyyy+ntra2rHjjx48nl8uxePHi3rLQggULGD9+fFxNjix/qrWGhobeZVFPteaa210q6JXu81M/YiivMKln4fuSmmR/2mmn9dbop06dykUXXURLS8sOvf1Sbdy4kXHjxjF9+nS2bdvG9OnTGTduHBs3boyz2ZHkJ2vL1+zzk7WpjCMicUtNss/vhD3rrLN6R+PMmzev7NE4PT09LF68eLuJ0BYvXsycOXPibHYkaZ6srRoO4JHoVHbJjtQk+7jV1tayfv16Vq1a1fsz8qKLLiq7LJQ11XAAj0SX9rJLYadjoFMnqtMxuNQk+7iHXsZdFkqChl6KDEydjvikJtkXDr3s6urq3VF71llnlZXs4y4LJUHnyRWRnSU1yT7uoZcAM2fOJJfL0d3dzYEHHsjMmTOjNjNWSQy9lPSopnp4nIr2sm/fcV9PpWR1f1Rqkn3cQy+roUSSxNBLSY+018OTUGz4YpRhjUn8w8xqaSg1yT7uGns1lEg09FJkYFn8h5mU1CT7uGvs3d3dfOc739nhCNqoJZI4Jy5L89DLtEvip3hWyy4STVITl8UtNckegoR/6aWXxjLpz6677sovfvELTj/9dI499lhuvfVWLrvsMkaPHl12zCRKQzpPbnmS+CmuXqSUI6mJy+KWmonQ4rZ582bGjh3LCSecwKhRozjhhBMYO3YsmzdvLjumJi4TkWqVqp593C688MLtSiQXXnghc+fOLTueRs+IVL+slutSlezjrIebGZdccglPPPEEb7zxBk888QSXXHJJpGmO6+rqOOecc7jpppt623j88cdr9IxIFclquS41yT7uevjkyZN59NFHmTlzJt/4xje4+OKLufvuu9l3333LbmNDQwOLFi3a4SjfNB2VKyJSTMnJ3sxGAPcBz7j7cWZ2ALAM2BO4H/iSu79ebkPiHir5wgsvcPDBB3PPPfdw9913Y2YcfPDBPPXUU+U2kVwuR0tLC1ddddV2pzq86aabyo4pIgNL+0Fa1WIoPfuvA93A7uHtRcDF7r7MzC4HmoHLym1I3PXwLVu2sHDhQi688MLexHzmmWdGmvWyu7ubBx54gHPPPbf3H9LWrVs577zzyo4p5clq3TUJaU6mcR+klWUlJXszm0zwjWkDzrCg8H0kcFK4yhLg20RI9nEfTVpTU8OZZ57Jj370o96y0Oc+9zlqasqvXOmI1/TIYt01iX9wSqbZUWrm+3dgATA2vL0XsMHde8Lba4F9ojQk7qNJd999d15++WUeeOABpk6dysMPP9x7QpO0tFFkKLL4D07iM2iyN7PjgBfcfaWZHZFfXGTVomcQMLO5wFyACRMm0NXVVXQ7EydO5OSTT2bOnDmsWbOG/fbbj1NOOYWJEyf2+5iBbNiwgeOOO46FCxeydetWRo4cySc+8QluvvnmsuIl0cbCXwh95XK5stpYqNzn2d/jN23aVDRmqdtJe7wkYqY9XqnbiSpt8UopXY0eWf52+ntfyhVLPHcf8AKcR9BzXw08B7wKXAe8BNSE63wAuGOwWDNmzPBS5HK5ktYbSH19vXd2dm4Xr7Oz0+vr6yPHLowZl/1bbk5VvGKPL/acS91O2uMlETPt8YayneEcL4mYceeHUuMB93k/+XfQI2jd/Z/cfbK7TwG+CHS6+8lADvh8uNps4KfR/u3EK19yyeVy9PT09JZcWltbK900EZGdLso4+xZgmZmdCzwAtEdtjCYZk0pL88gUkSiGlOtJrRcAAA9wSURBVOzdvQvoCq8/CRweV0M0yZhUmkamyHCWmonQNMmYiEhyUjNdQhKTjPU3D06wH0NEktL3u2eLgr9RvnuFMfPxosRMoo1plpqeff6ApUJRD1jK74Xev+XmviOMRCRBhd+3XC4Xy3evWLwoMZNoY5qlpmevA5ZkONIOX0mL1CR7jZ6R4UY7fCVNUpPsQaNnRAajXwpSrlQlexm+NEtldPqlIFEo2ctOoUm8RCorNcl+oNMFDuc95HF47zl38vJrW3dY3jdxjtt1JA+dfVRJMauhJ66ShkjpUpPsCxO6fpoOzcuvbY2915z2nrhKGiJDk5px9iIikpzU9OxFpHKydjRpFqlnLyKZO5o0i5TsRUQyQGWcIdDEaiJSrdSzH4LCn7qFk6uJiKSdkr2ISAYo2YuIZICSvYhIBijZi4hkgJK9iEgGKNmLiGSAkr2ISAYo2YuIZMCgR9Ca2SjgV0BtuP6N7n62mR0ALAP2BO4HvuTuryfZ2OEiifnnRUQGUsp0CVuAI919k5mNBFaY2W3AGcDF7r7MzC4HmoHLEmzrsJHE/PMiIgMZtIzjgU3hzZHhxYEjgRvD5UuA4xNpoYiIRFZSzd7MRpjZg8ALwF3AH4EN7t4TrrIW2CeZJoqISFQlzXrp7tuAQ8xsPPAToK7YasUea2ZzgbkAEyZMoKurq6SGlbpeqeKOFzVm38du2rSpaLxStjGU88V2dY2uSBuTiFfKNuKQ9s9i3PH6e1+Ga7wkYqYyXuFMjqVcgLOBfwReAmrCZR8A7hjssTNmzPBS7N9yc0nrlSrueFFjFntsLpcrextxx0siZhJtjPOxOytm2uO5F39fhnO8JGJWKh5wn/eTfwct45jZ28IePWa2K/BRoBvIAZ8PV5sN/DTavx0REUlKKWWcicASMxtBUOO/wd1vNrPHgGVmdi7wANCeYDtLpmGN8Sk6Guj2HV/HSsXLKzypTP7cqZCek8okcX7XtD9nSZ9Bk727PwxML7L8SeDwJBoVhYY1xqPvawjBa1ZseSXiFconuGLvcxoUJuC42pj25yzpoyNoRUQyoOLnoFXZRUQkeRVP9iq7iIgkT2UcEZEMqHjPvhrEXWoaykFQEH0HpoiIkn0J4i41vdJ9vkpXIrJTqYwjIpIBSvYiIhmgMs4wkdTRqSIyPCjZDwNJHp0qIsODyjgiIhlQ8Z59NQxDrIY2ZllHRwdtbW10d3dTV1dHa2srTU1NlW6WSKpUPNlXwzDEamhjVnV0dNDa2kp7ezvbtm1jxIgRNDc3AyjhixRQGUeqWltbG+3t7TQ0NFBTU0NDQwPt7e20tbVVumkiqVLxnn3cqqXkUi2jZ5KYiz1O3d3dzJo1a7tls2bNoru7u0ItEkmnYZfsq6HkUk2jZ5KYiz1OdXV1rFixgoaGht5lK1asoK6u2GmSRbJLZRypaq2trTQ3N5PL5ejp6SGXy9Hc3Exra2ulmyaSKsOuZy/p1l9ZCMorDeV3ws6fP793NE5bW5t2zor0oZ697FSFZ7vP5XLb3S5XU1MTq1atYvny5axatUqJXqQIJXsRkQxQshcRyYBU1OzjHoaYxLDGahkqKdGlfbipSDkqnuzjHoaYxLDGahoqKdGlfbipSDlUxhERyYBBe/Zmti/wA+DtwBvAFe5+iZntCVwPTAFWAye6+/pyGxL3kLy+MeOOVxgz7nhxxYwjnogMD6X07HuAM929Dng/8HdmNhVYCCx394OA5eHtsiUxJC+peH1jxh0vTc9ZRIaHQZO9uz/r7veH118BuoF9gE/z5owzS4Djk2qkiIhEY0Pp9ZnZFOBXwDRgjbuPL7hvvbvvUeQxc4G5ABMmTJixbNmyQbezadMmxowZU3K7dna8JGKmPV4SMdMeL4mYaY+XRMy0x0siZqXiNTQ0rHT3w4reWfhTf6ALMAZYCXw2vL2hz/3rB4sxY8YML0UulytpvVLFHS+JmGmPl0TMtMdLImba4yURM+3xkohZqXjAfd5P/i1pNI6ZjQR+BFzn7j8OFz9vZhPD+ycCL5QSS0REdr5Bk70FwzvagW53v6jgrp8Bs8Prs4Gfxt88ERGJQykHVX0Q+BLwiJk9GC47CzgfuMHMmoE1wAnJNFFERKIaNNm7+wrA+rm7Md7miIhIEnQErYhIBgxp6GXkjZm9CDxVwqpvBV6KcdNxx0siZtrjJREz7fGSiJn2eEnETHu8JGJWKt7+7v62Ynfs1GRfKjO7z/sbK5qCeEnETHu8JGKmPV4SMdMeL4mYaY+XRMw0xlMZR0QkA5TsRUQyIK3J/oqUx0siZtrjJREz7fGSiJn2eEnETHu8JGKmLl4qa/YiIhKvtPbsRUQkRqlK9mb2cTN73MyeMLNI8+OH8a4ysxfMbFVM7dvXzHJm1m1mj5rZ12OIOcrMfmNmD4Uxz4mprSPM7AEzuzmGWKvN7BEze9DM7oupfePN7EYz+134en4gQqx3hW3LXzaa2d9HbN83wvdjlZl1mNmoiPG+HsZ6tNy2Ffs8m9meZnaXmf0h/LvDzLNDjHdC2MY3zGzIoz/6iflv4fv8sJn9xMzGDxSjhHj/GsZ60MzuNLNJUeIV3PcPZuZm9tZS4w3Qxm+b2TMFn8ljI8a7viDW6oLZDErX3wxpO/sCjAD+CLwD2AV4CJgaMeaHgUOBVTG1cSJwaHh9LPD7GNpowJjw+kjg18D7Y2jrGcBS4OYYYq0G3hrz+70E+HJ4fRdgfIyfo+cIxhuXG2Mf4E/AruHtG4BTI8SbBqwCdiM4av0XwEFlxNnh8wxcACwMry8EFkWMVwe8C+gCDoupjUcBNeH1RTG0cfeC618DLo8SL1y+L3AHwXFAQ/qs99PGbwP/UObnZcC8BVwIfGuocdPUsz8ceMLdn3T314FlBCdIKZu7/wr4SxyNC+P1dyKXKDHd3TeFN0eGl0g7UsxsMvAJ4HtR4iTFzHYn+EC3A7j76+6+IabwjcAf3b2Ug/cGUgPsamY1BEl6XYRYdcC97v6qu/cAvwQ+M9Qg/Xyeyz6JULF47t7t7o8PtW2DxLwzfN4A9wKTI8bbWHBzNEP4vgyQEy4GFgwlVgkxyzJQvHBiyhOBjqHGTVOy3wd4uuD2WiIm0iRZcCKX6QQ98aixRoQ/y14A7nL3qDH/neCD+0bUtoUcuNPMVlpwMpqo3gG8CHw/LDV9z8xGxxAX4IuU8UUo5O7PAIsJJvh7FnjZ3e+MEHIV8GEz28vMdgOOJehJxmGCuz8LQWcE2DumuEmZA9wWNYiZtZnZ08DJwLcixvoU8Iy7PxS1XX18NSw3XTWU8togPgQ87+5/GOoD05Tsi022lsqhQmY2hmB+/7/v08soi7tvc/dDCHo8h5vZtAhtOw54wd1XRm1XgQ+6+6HAMQTnIP5wxHg1BD9TL3P36cBmIp7DGMDMdgE+BfwwYpw9CHrMBwCTgNFmdkq58dy9m6B8cRdwO0GJsmfABw1DZtZK8LyvixrL3Vvdfd8w1lcjtGk3oJWI/zCKuAx4J3AIQYfhwpjiNlFmZyZNyX4t2/d2JhPtp3MirPiJXGIRljK6gI9HCPNB4FNmtpqgFHakmV0bsV3rwr8vAD8hKLlFsRZYW/AL5kaC5B/VMcD97v58xDgfBf7k7i+6+1bgx8DMKAHdvd3dD3X3DxP8RB9yz6wfVXESITObDRwHnOxh4TkmS4HPRXj8Own+qT8UfmcmA/eb2dujNMrdnw87cW8AVxL9O0NYUvwscH05j09Tsv8tcJCZHRD20L5IcIKU1AjrZcVO5BIl5tvyoxPMbFeCRPO7cuO5+z+5+2R3n0LwGna6e9m9UjMbbWZj89cJdrZFGt3k7s8BT5vZu8JFjcBjUWKGyu719LEGeL+Z7Ra+540E+2fKZmZ7h3/3I/jCxtFOqIKTCJnZx4EW4FPu/moM8Q4quPkpon1fHnH3vd19SvidWUswCOO5iG2cWHDzM0T8zoQ+CvzO3deW9ehy9hYndSGoZf6eYFROawzxOgh+Qm0leBObI8abRVBaehh4MLwcGzHmXwEPhDFXUcZe9gFiH0HE0TgE9fWHwsujcbwvYdxDgPvC530TsEfEeLsBfwbGxdS+cwiSyCrgGqA2Yrz/IfiH9hDQWGaMHT7PwF7AcoJfCsuBPSPG+0x4fQvwPHBHDG18gmB/XP47M5TRM8Xi/Sh8Xx4Gfg7sEyVen/tXM/TROMXaeA3wSNjGnwETo7YRuBqYV+5nUEfQiohkQJrKOCIikhAlexGRDFCyFxHJACV7EZEMULIXEckAJXupCma2LZzxb5WZ/TA88rEqmNndlW6DiJK9VIvX3P0Qd58GvA7MK7zTAqn8PLt7pKNvReKQyi+HyCD+BzjQzKZYMBf+fwP3A/ua2VFmdo+Z3R/+AhgDYGbHhnOqrzCz/7Bwnv9w3vGrzKzLzJ40s6/lN2JmN4WTvz1aOAGcmW0KJ+J6yMzuNbMJ4fIJFszX/lB4mZlfv+Cx/2hmvw0nyDonXDbazG4JH7PKzL6wE15DyRgle6kq4fwgxxAcnQjB3Os/8DcnVPsm8FEPJm67DzjDghOPfBc4xt1nAW/rE/bdwNEE85ecHc5/BDDH3WcAhwFfM7O9wuWjCaYsfi/wK+C0cPl/AL8Mlx9KcMRxYduPAg4Kt3MIMCOcVO7jwDp3f2/4y+X28l8hkeKU7KVa7BpOA30fwdw17eHyp9z93vD6+4GpwP+G684G9idI5k+6+5/C9frOS3OLu29x95cIJhKbEC7/mpk9RDAH+74EiRqCMlL+DGArgSnh9SMJZjvEg0mwXu6znaPCywMEv0TeHcZ8BPiomS0ysw8VeZxIZDWVboBIiV7zYBroXsEcZWwuXERwPoCmPutNHyT2loLr24AaMzuCYOKpD7j7q2bWBeRPTbjV35xnZBulf48MOM/dv7vDHWYzCOaGOs/M7nT3fykxpkhJ1LOX4eRe4INmdiAEc5Wb2cEEE5q9IzzhDEApNfFxwPow0b+b4FfDYJYDp4fbHmHBGbkK3QHMKdiPsI+Z7W3BOVRfdfdrCU6aEsd0zyLbUc9ehg13f9HMTgU6zKw2XPxNd/+9mf0/4HYzewn4TQnhbgfmmdnDwOME/0gG83XgCjNrJujxnw7cU9C+O82sDrgn/FWyCTgFOBD4NzN7g2Cmw9NL2JbIkGjWS8kEMxvj7pvC+en/C/iDu19c6XaJ7Cwq40hWnBbutH2UoESzQ91cZDhTz15EJAPUsxcRyQAlexGRDFCyFxHJACV7EZEMULIXEckAJXsRkQz4/wjdhou+UHRZAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df[df['Age'].notnull()].boxplot('Age','Pregnancies')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos['Age']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 36.5\n", "1 24.0\n", "2 43.0\n", "3 24.0\n", "4 25.0\n", " ... \n", "763 40.5\n", "764 25.0\n", "765 36.0\n", "766 24.0\n", "767 24.0\n", "Name: Age, Length: 768, dtype: float64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos['Age'].transform('median')" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.16724.00
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 24.0 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Age'].fillna(por_embarazos['Age'].transform('median'), inplace=True)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Valores atípicos\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(\"diabetes.csv\"))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAQRUlEQVR4nO3df2xdd3mA8efFTnEKCSWt27UJxUWqwJtRC7WqQqNpachYB6LZlkRYFYpWj0jt5P1gaGSzNIS2SK00LaBOyxZhpiAxtyXQH2pW2iozqSykMrspw8WwQmk7k9Ia1mRQIHXDuz98GprEqY8TX99+4+cjRffec8/pff96evS95x5HZiJJKs/rmj2AJOnUGHBJKpQBl6RCGXBJKpQBl6RCtS7kh5133nnZ0dGxkB8pScUbHR39UWa2H799QQPe0dHByMjIQn6kJBUvIp6aabtLKJJUKAMuSYUy4JJUKAMuSYUy4JJUKAOuRW1wcJCuri5aWlro6upicHCw2SNJtS3oZYTSa8ng4CD9/f0MDAywevVqhoeH6e3tBaCnp6fJ00mzi4W8nWx3d3d6HbheK7q6urj11ltZs2bN0W1DQ0P09fUxNjbWxMmkY0XEaGZ2H7+91hJKRPx5RDwWEWMRMRgRbRFxSUQ8HBGPR8TtEXHW/I8tNc74+DirV68+Ztvq1asZHx9v0kTS3Mwa8IhYCfwJ0J2ZXUAL8GHgFmB7Zl4KPA/0NnJQab51dnYyPDx8zLbh4WE6OzubNJE0N3W/xGwFlkZEK3A28AxwDbC7en8XsH7+x5Map7+/n97eXoaGhpiammJoaIje3l76+/ubPZpUy6xfYmbmDyLi74GngZ8DDwCjwMHMfKnabQJY2bAppQZ4+YvKvr4+xsfH6ezsZNu2bX6BqWLMGvCIeDNwHXAJcBD4InDtDLvO+G1oRGwBtgBcfPHFpzyo1Ag9PT0GW8Wqs4TyPuD7mTmZmVPAl4H3AudUSyoAq4ADMx2cmTszszszu9vbT7gboiTpFNUJ+NPAVRFxdkQEsBb4FjAEbKj22Qzc3ZgRJUkzmTXgmfkw019WPgJ8szpmJ/AJ4GMR8V3gXGCggXNKko5T65eYmflJ4JPHbX4CuHLeJ5Ik1eK9UCSpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwLWoDQ4O0tXVRUtLC11dXQwODjZ7JKm2Wn8TUzoTDQ4O0t/fz8DAAKtXr2Z4eJje3l4Aenp6mjydNLvIzAX7sO7u7hwZGVmwz5NeTVdXF7feeitr1qw5um1oaIi+vj7GxsaaOJl0rIgYzczuE7YbcC1WLS0t/OIXv2DJkiVHt01NTdHW1saRI0eaOJl0rJMF3DVwLVqdnZ0MDw8fs214eJjOzs4mTSTNjQHXotXf309vby9DQ0NMTU0xNDREb28v/f39zR5NqmXWLzEj4u3A7a/Y9Dbgb4DPV9s7gCeBTZn5/PyPKDVGT08PX/va17j22ms5fPgwr3/96/noRz/qF5gqxqxn4Jn5ncy8PDMvB64AfgbcCWwF9mbmpcDe6rVUjMHBQfbs2cN9993Hiy++yH333ceePXu8lFDFmOsSylrge5n5FHAdsKvavgtYP5+DSY22bds2BgYGWLNmDUuWLGHNmjUMDAywbdu2Zo8m1TKnq1Ai4nPAI5n5jxFxMDPPecV7z2fmm2c4ZguwBeDiiy++4qmnnpqHsaXT51UoKsVpX4USEWcBHwK+OJcPzsydmdmdmd3t7e1zOVRqqM7OTjZt2kRbWxsRQVtbG5s2bfIqFBVjLkso1zJ99v1s9frZiLgQoHp8br6Hkxpp5cqV3HXXXdxwww0cPHiQG264gbvuuouVK1c2ezSplrkEvAd45bc79wCbq+ebgbvnayhpIezbt4/rr7+ehx56iBUrVvDQQw9x/fXXs2/fvmaPJtVSK+ARcTawDvjyKzbfDKyLiMer926e//Gkxjl8+DBr1649ZtvatWs5fPhwkyaS5qbWzawy82fAucdt+zHTV6VIRWptbeXjH/84u3fvPnozqw0bNtDa6j3eVAZ/ialFa/ny5Rw6dIj9+/czNTXF/v37OXToEMuXL2/2aFIt3sxKi1ZLSwsXXXQRExMTR7etWrWKAwcOeBmhXlO8mZV0nKVLlzIxMcGNN97IwYMHufHGG5mYmGDp0qXNHk2qxYBr0XrhhRdYtmwZGzdu5Oyzz2bjxo0sW7aMF154odmjSbUYcC1q27dvp6+vj7a2Nvr6+ti+fXuzR5JqM+BatCKC0dFRxsbGOHLkCGNjY4yOjhIRzR5NqsWAa9Fat24dO3bs4KabbuLQoUPcdNNN7Nixg3Xr1jV7NKkWr0LRovb+97+fBx98kMwkIli3bh33339/s8eSjnGyq1D8xYLOSKeyDJKZPPDAA3M6diFPgKTjuYSiM1JmzunfWz9x75yPMd5qNgMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUqFoBj4hzImJ3RHw7IsYj4j0RsSIiHoyIx6vHNzd6WEnSr9Q9A/8M8JXMfAdwGTAObAX2ZualwN7qtSRpgcwa8IhYDvwmMACQmS9m5kHgOmBXtdsuYH2jhpQknajOGfjbgEngXyNif0R8NiLeAFyQmc8AVI/nz3RwRGyJiJGIGJmcnJy3wSVpsasT8Fbg3cCOzHwX8AJzWC7JzJ2Z2Z2Z3e3t7ac4piTpeHUCPgFMZObD1evdTAf92Yi4EKB6fK4xI0qSZjJrwDPzh8D/RMTbq01rgW8B9wCbq22bgbsbMqEkaUatNffrA74QEWcBTwB/yHT874iIXuBpYGNjRpQkzaRWwDPzUaB7hrfWzu84kqS6/CWmJBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoVrr7BQRTwI/AY4AL2Vmd0SsAG4HOoAngU2Z+XxjxpQkHW8uZ+BrMvPyzOyuXm8F9mbmpcDe6rUkaYGczhLKdcCu6vkuYP3pjyNJqqtuwBN4ICJGI2JLte2CzHwGoHo8f6YDI2JLRIxExMjk5OTpTyxJAmqugQNXZ+aBiDgfeDAivl33AzJzJ7AToLu7O09hRknSDGqdgWfmgerxOeBO4Erg2Yi4EKB6fK5RQ0qSTjRrwCPiDRGx7OXnwG8DY8A9wOZqt83A3Y0aUpJ0ojpLKBcAd0bEy/v/W2Z+JSL+E7gjInqBp4GNjRtTknS8WQOemU8Al82w/cfA2kYMJUmanb/ElKRCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKlRr3R0jogUYAX6QmR+MiEuA24AVwCPARzLzxcaMqcXssk89wKGfTzX8czq27mnof/9NS5fwjU/+dkM/Q4tL7YADfwqMA8ur17cA2zPztoj4Z6AX2DHP80kc+vkUT978gWaPcdoa/T8ILT61llAiYhXwAeCz1esArgF2V7vsAtY3YkBJ0szqroF/GvhL4JfV63OBg5n5UvV6Alg504ERsSUiRiJiZHJy8rSGlST9yqwBj4gPAs9l5ugrN8+wa850fGbuzMzuzOxub28/xTElScerswZ+NfChiPhdoI3pNfBPA+dERGt1Fr4KONC4MSVJx5v1DDwz/yozV2VmB/Bh4D8y83pgCNhQ7bYZuLthU0qSTnA614F/AvhYRHyX6TXxgfkZSZJUx1wuIyQzvwp8tXr+BHDl/I8kSarDX2JKUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVataAR0RbRHw9Ir4REY9FxKeq7ZdExMMR8XhE3B4RZzV+XEnSy+qcgR8GrsnMy4DLgd+JiKuAW4DtmXkp8DzQ27gxJUnHmzXgOe2n1csl1b8ErgF2V9t3AesbMqEkaUa11sAjoiUiHgWeAx4EvgcczMyXql0mgJUnOXZLRIxExMjk5OR8zCxJombAM/NIZl4OrAKuBDpn2u0kx+7MzO7M7G5vbz/1SSVJx5jTVSiZeRD4KnAVcE5EtFZvrQIOzO9okqRXU+cqlPaIOKd6vhR4HzAODAEbqt02A3c3akhJ0olaZ9+FC4FdEdHCdPDvyMx7I+JbwG0R8XfAfmCggXNqEVvWuZV37tra7DFO27JOgA80ewydQWYNeGb+F/CuGbY/wfR6uNRQPxm/mSdvLj98HVv3NHsEnWH8JaYkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1Kh6twPXGq6M+FWrG9auqTZI+gMY8D1mrcQ9wLv2LrnjLjnuBYXl1AkqVAGXJIKZcAlqVAGXJIKZcAlqVAGXJIKZcAlqVAGXJIKNesPeSLiLcDngV8DfgnszMzPRMQK4HagA3gS2JSZzzduVKm+iJj7MbfM/XMyc+4HSfOkzhn4S8BfZGYncBXwxxHx68BWYG9mXgrsrV5LrwmZuSD/pGaaNeCZ+UxmPlI9/wkwDqwErgN2VbvtAtY3akhJ0onmtAYeER3Au4CHgQsy8xmYjjxw/kmO2RIRIxExMjk5eXrTSpKOqh3wiHgj8CXgzzLz/+oel5k7M7M7M7vb29tPZUZJ0gxqBTwiljAd7y9k5perzc9GxIXV+xcCzzVmREnSTGYNeEx/nT8AjGfmP7zirXuAzdXzzcDd8z+eJOlk6twP/GrgI8A3I+LRattfAzcDd0REL/A0sLExI0qSZjJrwDNzGDjZRbVr53ccSVJd/hJTkgoVC/ljhIiYBJ5asA+U6jsP+FGzh5BO4q2ZecJlfAsacOm1KiJGMrO72XNIc+ESiiQVyoBLUqEMuDRtZ7MHkObKNXBJKpRn4JJUKAMuSYUy4Fo0IuL3IiIj4h3NnkWaDwZci0kPMAx8uNmDSPPBgGtRqO5nfzXQSxXwiHhdRPxTRDwWEfdGxL9HxIbqvSsiYl9EjEbE/S/fOll6LTHgWizWA1/JzP8G/jci3g38PtN/lPudwB8B74Gj97+/FdiQmVcAnwO2NWNo6dXUuZ2sdCboAT5dPb+ter0E+GJm/hL4YUQMVe+/HegCHqz+un0L8MzCjivNzoDrjBcR5wLXAF0RkUwHOYE7T3YI8FhmvmeBRpROiUsoWgw2AJ/PzLdmZkdmvgX4PtN3H/yDai38AuC3qv2/A7RHxNEllYj4jWYMLr0aA67FoIcTz7a/BFwETABjwL8ADwOHMvNFpqN/S0R8A3gUeO/CjSvV40/ptahFxBsz86fVMsvXgasz84fNnkuqwzVwLXb3RsQ5wFnA3xpvlcQzcEkqlGvgklQoAy5JhTLgklQoAy5JhTLgklSo/wem4dYgJy8DxAAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df['Age'].plot.box()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
1235132800026.80.186690
3634146780038.50.520671
453211900019.60.832720
459913474336025.90.460810
4898194800026.10.551670
537057600021.70.735670
66641458218032.50.235701
674891820035.60.587680
684513682000.00.640690
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "123 5 132 80 0 0 26.8 \n", "363 4 146 78 0 0 38.5 \n", "453 2 119 0 0 0 19.6 \n", "459 9 134 74 33 60 25.9 \n", "489 8 194 80 0 0 26.1 \n", "537 0 57 60 0 0 21.7 \n", "666 4 145 82 18 0 32.5 \n", "674 8 91 82 0 0 35.6 \n", "684 5 136 82 0 0 0.0 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "123 0.186 69 0 \n", "363 0.520 67 1 \n", "453 0.832 72 0 \n", "459 0.460 81 0 \n", "489 0.551 67 0 \n", "537 0.735 67 0 \n", "666 0.235 70 1 \n", "674 0.587 68 0 \n", "684 0.640 69 0 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Identificación basada en percentiles (también existe la basada en la desviación estándar)\n", "q3 = df['Age'].quantile(.75)\n", "q1 = df['Age'].quantile(.25)\n", "\n", "IQR = q3 - q1\n", "\n", "df.loc[(df['Age'] > q3 + 1.5 * IQR) | (df['Age'] < q1 - 1.5 * IQR)]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(759, 9)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df = df.loc[(df['Age'] <= q3 + 1.5 * IQR) & (df['Age'] >= q1 - 1.5 * IQR)]\n", "df.loc[(df['Age'] <= q3 + 1.5 * IQR) & (df['Age'] >= q1 - 1.5 * IQR)].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Binning\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](https://www.saedsayad.com/images/Binning_1.png)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(\"diabetes.csv\"))" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
count768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000
mean3.845052120.89453169.10546920.53645879.79947931.9925780.47187633.2408850.348958
std3.36957831.97261819.35580715.952218115.2440027.8841600.33132911.7602320.476951
min0.0000000.0000000.0000000.0000000.0000000.0000000.07800021.0000000.000000
25%1.00000099.00000062.0000000.0000000.00000027.3000000.24375024.0000000.000000
50%3.000000117.00000072.00000023.00000030.50000032.0000000.37250029.0000000.000000
75%6.000000140.25000080.00000032.000000127.25000036.6000000.62625041.0000001.000000
max17.000000199.000000122.00000099.000000846.00000067.1000002.42000081.0000001.000000
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin \\\n", "count 768.000000 768.000000 768.000000 768.000000 768.000000 \n", "mean 3.845052 120.894531 69.105469 20.536458 79.799479 \n", "std 3.369578 31.972618 19.355807 15.952218 115.244002 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 1.000000 99.000000 62.000000 0.000000 0.000000 \n", "50% 3.000000 117.000000 72.000000 23.000000 30.500000 \n", "75% 6.000000 140.250000 80.000000 32.000000 127.250000 \n", "max 17.000000 199.000000 122.000000 99.000000 846.000000 \n", "\n", " BMI DiabetesPedigreeFunction Age Outcome \n", "count 768.000000 768.000000 768.000000 768.000000 \n", "mean 31.992578 0.471876 33.240885 0.348958 \n", "std 7.884160 0.331329 11.760232 0.476951 \n", "min 0.000000 0.078000 21.000000 0.000000 \n", "25% 27.300000 0.243750 24.000000 0.000000 \n", "50% 32.000000 0.372500 29.000000 0.000000 \n", "75% 36.600000 0.626250 41.000000 1.000000 \n", "max 67.100000 2.420000 81.000000 1.000000 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcomeYoungAdult
061487235033.60.6275010
11856629026.60.3513101
28183640023.30.6723211
318966239428.10.1672101
40137403516843.12.2883311
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome YoungAdult \n", "0 0.627 50 1 0 \n", "1 0.351 31 0 1 \n", "2 0.672 32 1 1 \n", "3 0.167 21 0 1 \n", "4 2.288 33 1 1 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['YoungAdult'] = df['Age'].map(lambda age: 1 if age <= 35 else 0 ) # age <= 35 ? 1 : 0\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(270, 10)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['YoungAdult'] == 0].shape" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(498, 10)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['YoungAdult'] == 1].shape" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 low\n", "1 low\n", "2 low\n", "3 low\n", "4 very_low\n", " ... \n", "763 high\n", "764 low\n", "765 low\n", "766 very_low\n", "767 low\n", "Name: BloodPressure, Length: 768, dtype: category\n", "Categories (4, object): [very_low < low < high < very_high]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df['BloodPressure_Bin'] = pd.qcut(df['BloodPressure'], 4, labels=['very_low','low','high','very_high'])\n", "pd.qcut(df['BloodPressure'], 4, labels=['very_low','low','high','very_high'])" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcomeYoungAdultAgeCategogy
061487235033.60.6275010middle
11856629026.60.3513101young
28183640023.30.6723211young
318966239428.10.1672101young
40137403516843.12.2883311young
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome YoungAdult AgeCategogy \n", "0 0.627 50 1 0 middle \n", "1 0.351 31 0 1 young \n", "2 0.672 32 1 1 young \n", "3 0.167 21 0 1 young \n", "4 2.288 33 1 1 young " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['AgeCategogy'] = pd.cut(df['Age'],bins=[0, 35, 55, 120], labels=['young', 'middle', 'old'])\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transformación logarítmica\n", "\n", "\n", "\n", "Recuerde que log(0) = infinito" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df['Pregnancies'].plot.density(color='c')" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9016739791518588" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Pregnancies'].skew()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "np.log(df['Pregnancies'] + 1.0).plot.density(color='c')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## One-hot encoding\n", "\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(\"diabetes.csv\"))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcomeAgeCategogy
061487235033.60.627501middle
11856629026.60.351310young
28183640023.30.672321young
318966239428.10.167210young
40137403516843.12.288331young
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome AgeCategogy \n", "0 0.627 50 1 middle \n", "1 0.351 31 0 young \n", "2 0.672 32 1 young \n", "3 0.167 21 0 young \n", "4 2.288 33 1 young " ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['AgeCategogy'] = pd.cut(df['Age'],bins=[0, 35, 55, 120], labels=['young', 'middle', 'old'])\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcomeAgeCategogy_youngAgeCategogy_middleAgeCategogy_old
061487235033.60.627501010
11856629026.60.351310100
28183640023.30.672321100
318966239428.10.167210100
40137403516843.12.288331100
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome AgeCategogy_young \\\n", "0 0.627 50 1 0 \n", "1 0.351 31 0 1 \n", "2 0.672 32 1 1 \n", "3 0.167 21 0 1 \n", "4 2.288 33 1 1 \n", "\n", " AgeCategogy_middle AgeCategogy_old \n", "0 1 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.get_dummies(df,columns=['AgeCategogy'])\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Separación de valores\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TeamCityGamesMVP_Player
0EaglesRome12John Stuart
1BearsHelsinki15Leo Da Vinci
2RaptorsHong Kong23Mike Donatello
3HornetsHong Kong18Raphael Dolce
4BeesRome21Bruce Lee
\n", "
" ], "text/plain": [ " Team City Games MVP_Player\n", "0 Eagles Rome 12 John Stuart\n", "1 Bears Helsinki 15 Leo Da Vinci\n", "2 Raptors Hong Kong 23 Mike Donatello\n", "3 Hornets Hong Kong 18 Raphael Dolce\n", "4 Bees Rome 21 Bruce Lee" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame({'Team':['Eagles', 'Bears', 'Raptors', 'Hornets', 'Bees', 'Lions'], \n", " 'City':['Rome', 'Helsinki', 'Hong Kong', 'Hong Kong', 'Rome', 'Rome'],\n", " 'Games':[12, 15, 23, 18, 21, 8],\n", " 'MVP_Player': ['John Stuart', 'Leo Da Vinci', 'Mike Donatello', 'Raphael Dolce', 'Bruce Lee', 'Mahatma Gandhi']})\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def extract_name(fullname):\n", " return fullname.split(' ')[0]" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TeamCityGamesMVP_PlayerName
0EaglesRome12John StuartJohn
1BearsHelsinki15Leo Da VinciLeo
2RaptorsHong Kong23Mike DonatelloMike
3HornetsHong Kong18Raphael DolceRaphael
4BeesRome21Bruce LeeBruce
\n", "
" ], "text/plain": [ " Team City Games MVP_Player Name\n", "0 Eagles Rome 12 John Stuart John\n", "1 Bears Helsinki 15 Leo Da Vinci Leo\n", "2 Raptors Hong Kong 23 Mike Donatello Mike\n", "3 Hornets Hong Kong 18 Raphael Dolce Raphael\n", "4 Bees Rome 21 Bruce Lee Bruce" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df['Name'] = df['MVP_Player'].apply(lambda fullname: fullname.split(' ')[0])\n", "df['Name'] = df.apply(lambda row: row['MVP_Player'].split(' ')[0], axis = 1 )\n", "df['Name'] = df['MVP_Player'].apply(extract_name)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ajuste de escala\n", "\n", "\n", "\n", "El ajuste de escala es una transformación aplicada a variables numéricas que tiene como objetivo asegurar que los valores de diferentes variables estén en el mismo rango. Esta transformación es necesaria cuando se emplean algoritmos sensibles a las magnitudes de las variables.\n", "\n", "El método de ajuste más utilizado se basa en el cálculo del valor z (puntuación estándar, z-score); genera valores centrados en cero y con una desviación estándard igual a 1.\n", "\n", "El valor Z mide las desviaciones estándar de distancia entre un valor y la media.\n", "\n", "__[Boston house prices dataset](https://scikit-learn.org/stable/datasets/index.html#boston-dataset)__" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_boston\n", "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTAT
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.98
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.14
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.03
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.94
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.33
\n", "
" ], "text/plain": [ " CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \\\n", "0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 \n", "1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 \n", "2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 \n", "3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 \n", "4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 \n", "\n", " PTRATIO B LSTAT \n", "0 15.3 396.90 4.98 \n", "1 17.8 396.90 9.14 \n", "2 17.8 392.83 4.03 \n", "3 18.7 394.63 2.94 \n", "4 18.7 396.90 5.33 " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "boston_dataset = load_boston()\n", "df = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StandardScaler(copy=True, with_mean=True, with_std=True)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaler = StandardScaler()\n", "scaler.fit(df)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.41978194, 0.28482986, -1.2879095 , ..., -1.45900038,\n", " 0.44105193, -1.0755623 ],\n", " [-0.41733926, -0.48772236, -0.59338101, ..., -0.30309415,\n", " 0.44105193, -0.49243937],\n", " [-0.41734159, -0.48772236, -0.59338101, ..., -0.30309415,\n", " 0.39642699, -1.2087274 ],\n", " ...,\n", " [-0.41344658, -0.48772236, 0.11573841, ..., 1.17646583,\n", " 0.44105193, -0.98304761],\n", " [-0.40776407, -0.48772236, 0.11573841, ..., 1.17646583,\n", " 0.4032249 , -0.86530163],\n", " [-0.41500016, -0.48772236, 0.11573841, ..., 1.17646583,\n", " 0.44105193, -0.66905833]])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "array = scaler.transform(df)\n", "array" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTAT
0-0.4197820.284830-1.287909-0.272599-0.1442170.413672-0.1200130.140214-0.982843-0.666608-1.4590000.441052-1.075562
1-0.417339-0.487722-0.593381-0.272599-0.7402620.1942740.3671660.557160-0.867883-0.987329-0.3030940.441052-0.492439
2-0.417342-0.487722-0.593381-0.272599-0.7402621.282714-0.2658120.557160-0.867883-0.987329-0.3030940.396427-1.208727
3-0.416750-0.487722-1.306878-0.272599-0.8352841.016303-0.8098891.077737-0.752922-1.1061150.1130320.416163-1.361517
4-0.412482-0.487722-1.306878-0.272599-0.8352841.228577-0.5111801.077737-0.752922-1.1061150.1130320.441052-1.026501
\n", "
" ], "text/plain": [ " CRIM ZN INDUS CHAS NOX RM AGE \\\n", "0 -0.419782 0.284830 -1.287909 -0.272599 -0.144217 0.413672 -0.120013 \n", "1 -0.417339 -0.487722 -0.593381 -0.272599 -0.740262 0.194274 0.367166 \n", "2 -0.417342 -0.487722 -0.593381 -0.272599 -0.740262 1.282714 -0.265812 \n", "3 -0.416750 -0.487722 -1.306878 -0.272599 -0.835284 1.016303 -0.809889 \n", "4 -0.412482 -0.487722 -1.306878 -0.272599 -0.835284 1.228577 -0.511180 \n", "\n", " DIS RAD TAX PTRATIO B LSTAT \n", "0 0.140214 -0.982843 -0.666608 -1.459000 0.441052 -1.075562 \n", "1 0.557160 -0.867883 -0.987329 -0.303094 0.441052 -0.492439 \n", "2 0.557160 -0.867883 -0.987329 -0.303094 0.396427 -1.208727 \n", "3 1.077737 -0.752922 -1.106115 0.113032 0.416163 -1.361517 \n", "4 1.077737 -0.752922 -1.106115 0.113032 0.441052 -1.026501 " ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_scaled = pd.DataFrame(array, columns=df.columns)\n", "df_scaled.head()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-1.18321596],\n", " [-0.50709255],\n", " [ 0.16903085],\n", " [ 1.52127766]])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Revisión con menos datos\n", "data = [[-1]\n", " , [-0.5]\n", " , [0]\n", " , [1]\n", " ]\n", "scaler.fit(data)\n", "scaler.transform(data)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "mean_a = np.array([-1,-0.5, 0, 1]).mean()\n", "std_a = np.array([-1,-0.5, 0, 1]).std()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-0.125\n", "0.739509972887452\n" ] } ], "source": [ "print(mean_a)\n", "print(std_a)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-1.1832159566199232" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(-1 - mean_a) / std_a" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.52127765851133" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(data[3][0] - mean_a) / std_a" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }